A file system API is an application programming interface through which a utility or user program requests services of a file system. An operating system may provide abstractions for accessing different file systems transparently.
Some file system APIs may also include interfaces for maintenance operations, such as creating or initializing a file system, verifying the file system for integrity, and defragmentation.
Each operating system includes the APIs needed for the filesystems it supports. Microsoft Windows has file system APIs for NTFS and several FAT file systems. Linux systems can include APIs for ext2, ext3, ReiserFS, and Btrfs to name a few.
Contents |
Some early operating systems were capable of handling only tape and disk file systems. These provided the most basic of interfaces with:
More coordination such as device allocation and deallocation required the addition of:
As filesystem provided more services, more interfaces were defined:
As additional filesystem types, hierarchy structure and supported media increased features needed some specialized functions:
Multi-user systems required APIs for:
Writing user data to a file system is provided for use directly by the user program or the run-time library. The run-time library for some programing languages may provide type conversion, formatting and blocking. Some filessystems provide identification of records by key and may include re-writing an existing record. This operation is sometimes called PUT
or PUTX
(if the record exists)
Reading user data, sometimes called GET, may include a direction (forward or reverse) or in the case of a keyed filessystems a specific key. As with writing run-time libraries may intercede for the user program.
Positioning includes adjusting the location of the next record. This may include skipping forward or reverse as well as positioning to the beginning or end of the file.
The open API may be explicitly requested or implicitly invoked upon the issuance of the first operation by a process on an object. It may cause the mounting of removable media, establishing a connection to another host and validating the location and accessibility of the object. It updates system structures to indicate that the object is in use.
Usual requirements for requesting access to a file system object include:
Additional information may be necessary, for example a password. Additionally a declaration that other processes may access the same object while the opening process is using the object (sharing). This may depend on the intent of the other process. In contrast a declaration that no other process may access the object regardless of the other processes intent (exclusive use). These are requested via a programming language library which may provide coordination among modules in the process in addition to forwarding the request to the file system.
It must be expected that something may go wrong during the processing of the open.
Depending on the programming language, additional specifications in the open may establish the modules to handle these conditions. Some libraries specify a library module to the file system permitting analysis should the opening program be unable to perform any meaningful action as a result of a failure. For example if the failure is on the attempt to open the necessary input file the only action may be to report the failure and abort the program. Some languages simply return a code indicating the type of failure which always must be checked by the program which decides what to report and if it can continue.
Close may cause un-mounting or ejecting removable media and will update library and file system structures to indicate that the object is no longer in use. The minimal specification to the close references the object. Additionally some files systems provide specifing a disposition of the object which may indicate the object is to be discarded and no longer be part of the file system. Similar to the open, it must be expected that something may go wrong.
Considerations for handling a failure are similar to those of the open.
Information about the data in a file is called meta-data.
Some of the meta data is maintained by the filesystem, for example last-modification-date (and various other dates depending on the filesystem), location of the beginning of the file, the size of the file and if the filesystem backup utility has saved the current version of the files. These items cannot usually be altered by a user program.
Additional meta data supported by some file systems may include the owner of the file, the group to which the file belongs as well as permissions and/or access control (i.e. What access and updates various users or groups may perform), and whether the file is normally visible when the directory is listed. These items are usually modifiable by file system utilities which may be executed by the owner.
Some applications store more meta-data. For images the meta data may include the camera model and settings used to take the photo. For audio files, the meta data may include the album, artist who recorded the recording and comments about the recording which may be specific to a particular copy of the file (i.e. different copies of the same recording may have different comments as update by the owner of the file). Documents may include items like checked-by, approved-by, etc.
Renaming a file, moving a file (or a subdirectory) from one directory to another and deleting a file are examples of the operations provide by the file system for the management of directories.
Meta data operations such as permitting or restricting access the a directory by various users or groups of users are usually included.
As a filesystem is used directories, files and records may be added, deleted or modified. This usually causes inefficiencies in the underlying data structures. Things like logically sequential blocks distributed across the media in a way that causes excessive repositioning, partially used even empty blocks included in linked structures. Incomplete structures or other inconsistencies may be caused by device or media errors, inadequate time between detection of impending loss of power and actual power loss, improper system shutdown or media removal, and on very rare occasions filesytem coding errors.
Specialized routines in the file system are included to optimize or repair these structures. They are not usually invoked by the user directly but triggered within the filesystem itself. Internal counters of the number of levels of structures, number of inserted objects may be compared against thresholds. These may cause user access to be suspended to a specific structure (usually to the displeasure(?) of the user or users effected) or may be started as low priority asynchronous tasks or they may be deferred to a time of low user activity. Sometimes these routines are invoked or scheduled by the system manager or as in the case of defragmentation.
The API is "kernel-level" when the kernel not only provides the interfaces for the filesystems developers but is also the space in which the filesystem code resides.
It differs with the old schema in that the kernel itself uses its own facilities to talk with the filesystem driver and vice-versa, as contrary to the kernel being the one that handles the filesystem layout and the filesystem the one that directly access the hardware.
It is not the cleanest scheme but resolves the difficulties of major rewrite that has the old scheme.
With modular kernels it allows adding filesystems as any kernel module, even third party ones. With non-modular kernels however it requires the kernel to be recompiled with the new filesystem code (and in closed-source kernels, this makes third party filesystem impossible).
Unixes and Unix-like systems such as Linux have used this modular scheme.
There is a variation of this scheme used in MS-DOS (DOS 4.0 onward) and compatibles to support CD-ROM and network filesystems. Instead of adding code to the kernel, as in the old scheme, or using kernel facilities as in the kernel-based scheme, it traps all calls to a file and identifies if it should be redirected to the kernel's equivalent function or if it has to be handled by the specific filesystem driver, and the filesystem driver "directly" access the disk contents using low-level BIOS functions.
The API is "driver-based" when the kernel provides facilities but the filesystem code resides totally external to the kernel (not even as a module of a modular kernel).
It is a cleaner scheme as the filesystem code is totally independent, it allows filesystems to be created for closed-source kernels and online filesystem additions or removals from the system.
Examples of this scheme are the Windows NT and OS/2 respective IFSs.
In this API all filesystems are in the kernel, like in kernel-based APIs, but they are automatically trapped by another API, that is driver-based, by the OS.
This scheme was used in Windows 3.1 for providing a FAT filesystem driver in 32-bit protected mode, and cached, (VFAT) that bypassed the DOS FAT driver in the kernel (MSDOS.SYS) completely, and later in the Windows 9x series (95, 98 and Me) for VFAT, the ISO9660 filesystem driver (along with Joliet), network shares, and third party filesystem drivers, as well as adding to the original DOS APIs the LFN API (that IFS drivers can not only intercept the already existent DOS file APIs but also add new ones from within the 32-bit protected mode executable).
However that API was not completely documented, and third parties found themselves in a "make-it-by-yourself" scenario even worse than with kernel-based APIs.
The API is in the user space when the filesystem does not directly use kernel facilities but accesses disks using high-level operating system functions and provides functions in a library that a series of utilities use to access the filesystem.
This is useful for handling disk images.
The advantage is that a filesystem can be made portable between operating systems as the high-level operating system functions it uses can be as common as ANSI C, but the disadvantage is that the API is unique to each application that implements one.
Examples of this scheme are the hfsutils and the adflib.
As all filesystems (at least the disk ones) need equivalent functions provided by the kernel, it is possible to easily port a filesystem code from one API to another, even if they are of different types.
For example, the ext2 driver for OS/2 is simply a wrapper from the Linux's VFS to the OS/2's IFS and the Linux's ext2 kernel-based, and the HFS driver for OS/2 is a port of the hfsutils to the OS/2's IFS. There also exists a project that uses a Windows NT IFS driver for making NTFS work under Linux.